Search CORE

71 research outputs found

State of the art in selection of variables and functional forms in multivariable analysis-outstanding issues

Author: Abrahamowicz M
Becher H
Binder H
Dunkler D
for TG2 of the STRATOS initiative
Harrell FE
Heinze G
Perperoglou A
Royston P
Sauerbrei W
Schmid M
Publication venue
Publication date: 02/04/2020
Field of study

Background: How to select variables and identify functional forms for continuous variables is a key concern when creating a multivariable model. Ad hoc ‘traditional’ approaches to variable selection have been in use for at least 50 years. Similarly, methods for determining functional forms for continuous variables were first suggested many years ago. More recently, many alternative approaches to address these two challenges have been proposed, but knowledge of their properties and meaningful comparisons between them are scarce. To define a state of the art and to provide evidence-supported guidance to researchers who have only a basic level of statistical knowledge, many outstanding issues in multivariable modelling remain. Our main aims are to identify and illustrate such gaps in the literature and present them at a moderate technical level to the wide community of practitioners, researchers and students of statistics. Methods: We briefly discuss general issues in building descriptive regression models, strategies for variable selection, different ways of choosing functional forms for continuous variables and methods for combining the selection of variables and functions. We discuss two examples, taken from the medical literature, to illustrate problems in the practice of modelling. Results: Our overview revealed that there is not yet enough evidence on which to base recommendations for the selection of variables and functional forms in multivariable analysis. Such evidence may come from comparisons between alternative methods. In particular, we highlight seven important topics that require further investigation and make suggestions for the direction of further research. Conclusions: Selection of variables and of functional forms are important topics in multivariable analysis. To define a state of the art and to provide evidence-supported guidance to researchers who have only a basic level of statistical knowledge, further comparative research is required

UCL Discovery

Stepwise classification of cancer samples using clinical and molecular data

Author: A Tan
AL Boulesteix
AL Boulesteix
AL Boulesteix
Askar Obulkasim
D Dunkler
D Krag
Gerrit A Meijer
JA Stephenson
JR Tibshirani
KA Cao
L Breiman
M Bovelstad
M Futschik
M Jelizarow
M van de Vijver
Mark A van de Wiel
RJ Nevins
SL Pomeroy
Y Qi
Z Yong
ZX Huang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Combining clinical and molecular data types may potentially improve prediction accuracy of a classifier. However, currently there is a shortage of effective and efficient statistical and bioinformatic tools for true integrative data analysis. Existing integrative classifiers have two main disadvantages: First, coarse combination may lead to subtle contributions of one data type to be overshadowed by more obvious contributions of the other. Second, the need to measure both data types for all patients may be both unpractical and (cost) inefficient. Results We introduce a novel classification method, a stepwise classifier, which takes advantage of the distinct classification power of clinical data and high-dimensional molecular data. We apply classification algorithms to two data types independently, starting with the traditional clinical risk factors. We only turn to relatively expensive molecular data when the uncertainty of prediction result from clinical data exceeds a predefined limit. Experimental results show that our approach is adaptive: the proportion of samples that needs to be re-classified using molecular data depends on how much we expect the predictive accuracy to increase when re-classifying those samples. Conclusions Our method renders a more cost-efficient classifier that is at least as good, and sometimes better, than one based on clinical or molecular data alone. Hence our approach is not just a classifier that minimizes a particular loss function. Instead, it aims to be cost-efficient by avoiding molecular tests for a potentially large subgroup of individuals; moreover, for these individuals a test result would be quickly available, which may lead to reduced waiting times (for diagnosis) and hence lower the patients distress. Stepwise classification is implemented in R-package <it>stepwiseCM </it>and available at the Bioconductor website.</p

Crossref

VU Research Portal

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Gene expression of PMP22 is an independent prognostic factor for disease-free and overall survival in breast cancer patients

Author: A Tutt
AM Jetten
Andrea Wolf
Christian F Singer
D Dunkler
D Tong
Dan Tong
Dietmar Pils
DM Parkin
DS Douglas
FD Kury
FE Harrell Jr
G Heinze
Georg Heinze
Gerda Hofstetter
Ingrid Schiebel
J Li
K Hühne
M Schemper
M Schemper
M van Dartel
Margaretha Rudas
Nicole Concin
PE Lønning
PM Grambsch
Robert Zeillinger
V Evtimova
Publication venue: BioMed Central
Publication date: 01/12/2010
Field of study

Abstract Background Gene expression of peripheral myelin protein 22 (<it>PMP22</it>) and the epithelial membrane proteins (<it>EMPs</it>) was found to be differentially expressed in invasive and non-invasive breast cell lines in a previous study. We want to evaluate the prognostic impact of the expression of these genes on breast cancer. Methods In a retrospective multicenter study, gene expression of <it>PMP22 </it>and the <it>EMPs </it>was measured in 249 primary breast tumors by real-time PCR. Results were statistically analyzed together with clinical data. Results In univariable Cox regression analyses PMP22 and the EMPs were not associated with disease-free survival or tumor-related mortality. However, multivariable Cox regression revealed that patients with higher than median <it>PMP22 </it>gene expression have a 3.47 times higher risk to die of cancer compared to patients with equal values on clinical covariables but lower <it>PMP22 </it>expression. They also have a 1.77 times higher risk to relapse than those with lower <it>PMP22 </it>expression. The proportion of explained variation in overall survival due to <it>PMP22 </it>gene expression was 6.5% and thus PMP22 contributes equally to prognosis of overall survival as nodal status and estrogen receptor status. Cross validation demonstrates that 5-years survival rates can be refined by incorporating <it>PMP22 </it>into the prediction model. Conclusions <it>PMP22 </it>gene expression is a novel independent prognostic factor for disease-free survival and overall survival for breast cancer patients. Including it into a model with established prognostic factors will increase the accuracy of prognosis.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Intrinsic bias in breast cancer gene expression data sets

Author: A Naderi
AE Teschendorff
AE Teschendorff
AE Teschendorff
AV Ivshina
B Haibe-Kains
C Desmedt
C Desmedt
C Fan
C Sotiriou
C Sotiriou
C Sotiriou
D Dunkler
DJ Slamon
H Dai
HM Bovelstad
HY Chang
JD Mosley
JD Potter
Jonathan D Mosley
JX Yu
L Ein-Dor
L Harris
LD Miller
LJ van't Veer
MJ van de Vijver
P Eden
Ruth A Keri
S Gruvberger
S Michiels
SK Gruvberger
SY Kim
T Sorlie
WL McGuire
Y Wang
Y Yasui
Z Zhang
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background While global breast cancer gene expression data sets have considerable commonality in terms of their data content, the populations that they represent and the data collection methods utilized can be quite disparate. We sought to assess the extent and consequence of these systematic differences with respect to identifying clinically significant prognostic groups. Methods We ascertained how effectively unsupervised clustering employing randomly generated sets of genes could segregate tumors into prognostic groups using four well-characterized breast cancer data sets. Results Using a common set of 5,000 randomly generated lists (70 genes/list), the percentages of clusters with significant differences in metastasis latencies (HR p-value < 0.01) was 62%, 15%, 21% and 0% in the NKI2 (Netherlands Cancer Institute), Wang, TRANSBIG and KJX64/KJ125 data sets, respectively. Among ER positive tumors, the percentages were 38%, 11%, 4% and 0%, respectively. Few random lists were predictive among ER negative tumors in any data set. Clustering was associated with ER status and, after globally adjusting for the effects of ER-α gene expression, the percentages were 25%, 33%, 1% and 0%, respectively. The impact of adjusting for ER status depended on the extent of confounding between ER-α gene expression and markers of proliferation. Conclusion It is highly probable to identify a statistically significant association between a given gene list and prognosis in the NKI2 dataset due to its large sample size and the interrelationship between ER-α expression and markers of proliferation. In most respects, the TRANSBIG data set generated similar outcomes as the NKI2 data set, although its smaller sample size led to fewer statistically significant results.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Survival prediction from clinico-genomic models - a comparative study

Author: A Oberthür
AE Hoerl
AE Teschendorff
D Dunkler
DR Cox
E Bair
E Bair
H Binder
H Höfling
H Martens
HC van Houwelingen
Hege M Bøvelstad
HM Bøvelstad
J Clarke
J Pittman
JP Klein
JR Nevins
L Li
LJ van't Veer
M Campone
M Rosenwald
MH Galea
MJ van de Vijver
MR Segal
MY Park
NJD Nagelkerke
PJM Verweij
Project TINHLPF
R Tibshirani
R Tibshirani
R Tibshirani
S Nygård
S Paik
Ståle Nygård
T Hastie
Y Sun
Y Wang
Ørnulf Borgan
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Survival prediction from high-dimensional genomic data is an active field in today's medical research. Most of the proposed prediction methods make use of genomic data alone without considering established clinical covariates that often are available and known to have predictive value. Recent studies suggest that combining clinical and genomic information may improve predictions, but there is a lack of systematic studies on the topic. Also, for the widely used Cox regression model, it is not obvious how to handle such combined models. Results We propose a way to combine classical clinical covariates with genomic data in a clinico-genomic prediction model based on the Cox regression model. The prediction model is obtained by a simultaneous use of both types of covariates, but applying dimension reduction only to the high-dimensional genomic variables. We describe how this can be done for seven well-known prediction methods: variable selection, unsupervised and supervised principal components regression and partial least squares regression, ridge regression, and the lasso. We further perform a systematic comparison of the performance of prediction models using clinical covariates only, genomic data only, or a combination of the two. The comparison is done using three survival data sets containing both clinical information and microarray gene expression data. Matlab code for the clinico-genomic prediction methods is available at <url>http://www.med.uio.no/imb/stat/bmms/software/clinico-genomic/</url>. Conclusions Based on our three data sets, the comparison shows that established clinical covariates will often lead to better predictions than what can be obtained from genomic data alone. In the cases where the genomic models are better than the clinical, ridge regression is used for dimension reduction. We also find that the clinico-genomic models tend to outperform the models based on only genomic data. Further, clinico-genomic models and the use of ridge regression gives for all three data sets better predictions than models based on the clinical covariates alone.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

NORA - Norwegian Open Research Archives

Stromal Genes Add Prognostic Information to Proliferation and Histoclinical Markers: A Basis for the Next Generation of Breast Cancer Gene Signatures

Author: A Bergamaschi
A Goldhirsch
A Rody
AE Teschendorff
AH Beck
AH Beck
B Haibe-Kains
B Haibe-Kains
B Weigelt
B Weigelt
B Weigelt
C Desmedt
C Desmedt
C Desmedt
C Sotiriou
C Sotiriou
CM Perou
D Dunkler
D Mefford
Dwain Mefford
F Reyal
FC Geyer
G Alexe
G Bianchini
G Finak
H Dai
J Cuzick
Joel Mefford
JS Ross
K Chin
KJ Martin
L Harris
L van’t Veer
LD Miller
LJ van’t Veer
M Allinen
M Ringner
M Schmidt
MC Abba
MCU Cheang
MV Fournier
P Edén
P Farmer
P Wirapati
PE Colombo
RB West
RB West
S Loi
S Meng
S Myhre
Syed A. Aziz
T Karn
T Sørlie
WF Symmans
X Zhao
Y Drier
Y Pawitan
Y Wang
Publication venue: Public Library of Science
Publication date: 18/06/2012
Field of study

BACKGROUND: First-generation gene signatures that identify breast cancer patients at risk of recurrence are confined to estrogen-positive cases and are driven by genes involved in the cell cycle and proliferation. Previously we induced sets of stromal genes that are prognostic for both estrogen-positive and estrogen-negative samples. Creating risk-management tools that incorporate these stromal signatures, along with existing proliferation-based signatures and established clinicopathological measures such as lymph node status and tumor size, should better identify women at greatest risk for metastasis and death. METHODOLOGY/PRINCIPAL FINDINGS: To investigate the strength and independence of the stromal and proliferation factors in estrogen-positive and estrogen-negative patients we constructed multivariate Cox proportional hazards models along with tree-based partitions of cancer cases for four breast cancer cohorts. Two sets of stromal genes, one consisting of DCN and FBLN1, and the other containing LAMA2, add substantial prognostic value to the proliferation signal and to clinical measures. For estrogen receptor-positive patients, the stromal-decorin set adds prognostic value independent of proliferation for three of the four datasets. For estrogen receptor-negative patients, the stromal-laminin set significantly adds prognostic value in two datasets, and marginally in a third. The stromal sets are most prognostic for the unselected population studies and may depend on the age distribution of the cohorts. CONCLUSION: The addition of stromal genes would measurably improve the performance of proliferation-based first-generation gene signatures, especially for older women. Incorporating indicators of the state of stromal cell types would mark a conceptual shift from epithelial-centric risk assessment to assessment based on the multiple cell types in the cancer-altered tissue

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

Do Two Machine-Learning Based Prognostic Signatures for Breast Cancer Capture the Same Biological Processes?

Author: A Dupuy
A Farcomeni
A Subramanian
A Tanay
AE Teschendorff
AH Sims
AJ Minn
B Haibe-Kains
B van der Vegt
C Fan
C Rosty
C Sotiriou
C Sotiriou
C Sotiriou
C Sotiriou
CM Perou
D Barrell
D Dunkler
DF Ransohoff
DT Ross
Eytan Domany
F de Snoo
F Reyal
G Dennis Jr
GV Glinsky
HY Chang
HY Chang
JM Bland
JM Taylor
JT Chi
JX Yu
KJ Bussey
L Ein-Dor
L Ein-Dor
L Pusztai
LD Miller
LJ van't Veer
LJ van't Veer
LJ van't Veer
M Ashburner
M Buyse
MA Troester
MJ van de Vijver
ML Whitfield
P Eden
R Clarke
R Radpour
R Shamir
R Shen
R Simon
RA Fisher
S Koscielny
S Koscielny
S Michiels
S Paik
S Paik
S Takahashi
SF Altschul
SY Kim
SY Rhee
W Huang da
Wael El-Rifai
X Sole
Y Benjamini
Y Wang
Yotam Drier
Publication venue: Public Library of Science
Publication date: 01/03/2011
Field of study

The fact that there is very little if any overlap between the genes of different prognostic signatures for early-discovery breast cancer is well documented. The reasons for this apparent discrepancy have been explained by the limits of simple machine-learning identification and ranking techniques, and the biological relevance and meaning of the prognostic gene lists was questioned. Subsequently, proponents of the prognostic gene lists claimed that different lists do capture similar underlying biological processes and pathways. The present study places under scrutiny the validity of this claim, for two important gene lists that are at the focus of current large-scale validation efforts. We performed careful enrichment analysis, controlling the effects of multiple testing in a manner which takes into account the nested dependent structure of gene ontologies. In contradiction to several previous publications, we find that the only biological process or pathway for which statistically significant concordance can be claimed is cell proliferation, a process whose relevance and prognostic value was well known long before gene expression profiling. We found that the claims reported by others, of wider concordance between the biological processes captured by the two prognostic signatures studied, were found either to be lacking statistical rigor or were in fact based on addressing some other question

Crossref

Directory of Open Access Journals

PubMed Central

Molecular profiling currently offers no more than tumour morphology and basic immunohistochemistry

Author: A Abdullah-Sayani
AV Ivshina
B Weigelt
B Weigelt
B Weigelt
B Weigelt
B Weigelt
Britta Weigelt
C Desmedt
C Desmedt
C Fan
C Kim
C Sotiriou
C Sotiriou
C Sotiriou
CM Perou
D Dunkler
F Correa Geyer
F Reyal
HY Chang
J Cuzick
J Peppercorn
JD Brenton
JJ de Ronde
Jorge S Reis-Filho
JP Ioannidis
JS Parker
JS Reis-Filho
L Ein-Dor
L Lusa
L Pusztai
LJ van 't Veer
LJ van't Veer
LJ van't Veer
MA Lopez-Garcia
MJ van de Vijver
P Autier
P Wirapati
S Michiels
S Mook
S Paik
S Paik
S Paik
SA Aparicio
SA Aparicio
T Sorlie
T Sorlie
Y Wang
YD He
Z Hu
Publication venue: BioMed Central
Publication date
Field of study

Crossref

PubMed Central

Breast cancer prognostic classification in the molecular era: the role of histological grade

Author: A Goldhirsch
AN Mirza
Andrea L Richardson
B Weigelt
BW Davis
C Desmedt
C Fan
C Sotiriou
C Sotiriou
C Williams
CW Elston
CW Elston
D Dunkler
David J Dabbs
DE Henson
DE Rivadeneira
DL Page
DM Abd El-Rehim
DS Hopton
DS Nuyten
EA Rakha
EA Rakha
EA Rakha
EA Rakha
EA Rakha
EA Rakha
EA Rakha
Emad A Rakha
EO Hanrahan
European Commission
F Correa Geyer
F Theissig
FA Tavassoli
Fernando C Schmitt
Frederick Baehner
Gary M Tse
H Denley
H Pereira
HF Frierson Jr
HY Chang
I Balslev
Ian O Ellis
IO Ellis
J Hornberger
J Jacquemier
J Kollias
J Lundin
J Peppercorn
J Warwick
JF Simpson
JM Bueno-de-Mesquita
Jocelyne Jacquemier
Jorge S Reis-Filho
José Palacios
JS Thomas
K Chin
K Yu
L Lusa
L Pedersen
M Saimura
M Sikka
M Sundquist
MH Galea
MJ Ellis
MJ van de Vijver
N Chowdhury
P Boiesen
P Eden
P Robbins
P Wirapati
PL Fitzgibbons
Puay-Hoon Tan
R Natrajan
RA Walker
RW Blamey
RW Blamey
S Amat
S Frkovic-Grazio
S Mook
S Paik
S Paik
SB Edge
SE Singletary
Shu Ichihara
Stephen B Fox
Stuart J Schnitt
Sunil Badve
Sunil R Lakhani
T Sorlie
TA Longacre
Thomas Decker
TJ Anderson
V Le Doussal
Vincenzo Eusebi
W Reed
X Lu
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Breast cancer is a heterogeneous disease with varied morphological appearances, molecular features, behavior, and response to therapy. Current routine clinical management of breast cancer relies on the availability of robust clinical and pathological prognostic and predictive factors to support clinical and patient decision making in which potentially suitable treatment options are increasingly available. One of the best-established prognostic factors in breast cancer is histological grade, which represents the morphological assessment of tumor biological characteristics and has been shown to be able to generate important information related to the clinical behavior of breast cancers. Genome-wide microarray-based expression profiling studies have unraveled several characteristics of breast cancer biology and have provided further evidence that the biological features captured by histological grade are important in determining tumor behavior. Also, expression profiling studies have generated clinically useful data that have significantly improved our understanding of the biology of breast cancer, and these studies are undergoing evaluation as improved prognostic and predictive tools in clinical practice. Clinical acceptance of these molecular assays will require them to be more than expensive surrogates of established traditional factors such as histological grade. It is essential that they provide additional prognostic or predictive information above and beyond that offered by current parameters. Here, we present an analysis of the validity of histological grade as a prognostic factor and a consensus view on the significance of histological grade and its role in breast cancer classification and staging systems in this era of emerging clinical use of molecular classifiers. © 2010 BioMed Central Lt

Crossref

IUPUIScholarWorks

PubMed Central

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

D-Scholarship@Pitt

Institute of Cancer Research Repository

University of Melbourne Institutional Repository

Fondo Bibliográfico Digital Institucional

University of Queensland eSpace

Nottingham Prognostic Index Plus (NPI+): a modern clinical decision making tool in breast cancer

Author: A Goldhirsch
A Prat
A R Green
AC Wolff
AR Green
C C Nolan
C Lemetre
CM Perou
D Dunkler
D G Powe
D Soria
D Soria
DM Abd El-Rehim
E A Rakha
EA Rakha
EA Rakha
EA Rakha
F Ambrogi
G Ball
G Callagy
G D'Eredita
GM Clark
H Goulding
I Balslev
I O Ellis
IA Olivotto
IO Ellis
IO Ellis
J Brown
J M Garibaldi
JL Haybittle
JS Parker
KS McCarty Jr
LA Carey
LJ van't Veer
LJ van't Veer
M Friendly
MC Cheang
MH Galea
MJ van de Vijver
MW Beckmann
PM Ravdin
RW Blamey
RW Carlson
S Darb-Esfahani
S Paik
SC Lishman
T Garcia-Caballero
T Sorlie
TO Nielsen
WD Foulkes
WD Foulkes
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Background: Current management of breast cancer (BC) relies on risk stratification based on well-defined clinicopathologic factors. Global gene expression profiling studies have demonstrated that BC comprises distinct molecular classes with clinical relevance. In this study, we hypothesised that molecular features of BC are a key driver of tumour behaviour and when coupled with a novel and bespoke application of established clinicopathologic prognostic variables can predict both clinical outcome and relevant therapeutic options more accurately than existing methods. Methods: In the current study, a comprehensive panel of biomarkers with relevance to BC was applied to a large and well-characterised series of BC, using immunohistochemistry and different multivariate clustering techniques, to identify the key molecular classes. Subsequently, each class was further stratified using a set of well-defined prognostic clinicopathologic variables. These variables were combined in formulae to prognostically stratify different molecular classes, collectively known as the Nottingham Prognostic Index Plus (NPI+). The NPI+ was then used to predict outcome in the different molecular classes. Results: Seven core molecular classes were identified using a selective panel of 10 biomarkers. Incorporation of clinicopathologic variables in a second-stage analysis resulted in identification of distinct prognostic groups within each molecular class (NPI+). Outcome analysis showed that using the bespoke NPI formulae for each biological BC class provides improved patient outcome stratification superior to the traditional NPI. Conclusion: This study provides proof-of-principle evidence for the use of NPI+ in supporting improved individualised clinical decision making

Nottingham ePrints

Nottingham eTheses

Crossref

Repository@Nottingham

PubMed Central

Kent Academic Repository

WestminsterResearch